Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Concerned with Copyright Date: May 8, 2003 11:09am
Forum: web Subject: Re: Removal of Site- not from google

american society ethics education

edethics.org definitely comes up in the top 10, still there.>

Yes, when you type in essentially the entire name of our organization, it shows up number six - under other sites dealing with:

1. Argronomy
2. Earthlink
3 - 5. Engineers

None of which pertain, specifically to education

The real concern is that essentially no one has ever seached for our site using our full name - the top search phrase (according to our server logs) is "ethics in education"

When that is entered, using the quote marks, the site appears at the bottom of the 4th page on Google. When the quotes are removed, the site plunges to the 11th page on Google. The site has been at or very near the top, using the phrase I just mentioned for several years - until just a few months ago.

The most disturbing part, however, is that Google bases their "ranks" on the number of links sites have INTO them from other good sites. Well, our site has plenty of them - according to Googles own check, just a few weeks ago when I started investigating this, there are a least TWO HUNDRED such links into our site. Now, this is where it gets really interesting - the way Google recommends that people check to see just how many sites they are using to provide your page's rank, they state you should into "link:www.yourwebsite.com" (without quotes) into their search.

When you enter "link:www.edethics.org" into Google in this manner (which, as stated above is an important method they use to help determine rankings) you will note that this "search" states that the site doesn't appear!!!!!

Now do a regular search using the following phrase (including quotes):

"www.edethics.org"

Guess what? Searching just for the domain name brings up 194 results on Google. This number, however, is NOT used to calculate rankings. So, what's going on? When I wrote to Google, they couldn't figure it out either - and noted that something appeared to be very wrong. Allegedly, they're still "looking into it."

Even more disturbing is that none of the other sites I maintain appear using this method (link:www.yourwebsite.com) either.

Any ideas? Is my site being penalized because it was removed from archive.org???

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 8, 2003 11:51am
Forum: web Subject: Re: Removal of Site- not from google

You should note that the Internet Archive is in no way affiliated with Google. I'm curious where you got that impression.

But in answer to your question, you are absolutely not being punished for having your site removed. As the site author that decision is up to you. It's unfortunate, in my opinion, that you've decided to exclude your site from this permanent Archive, one which our grandchildren's children will be able to use learn from, but doing so is your own decision, and your site is not being penalized for this decision.

This post was modified by Jonathan Aizen on 2003-05-08 18:51:44

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 8, 2003 1:35pm
Forum: web Subject: Re: Removal of Site- not from google



Thank you for your reponse. In response to your question about the Internet Archive and Google (are you saying that the Internet Archive is also not affiliated with Alexa too?), I would first like to direct your attention to the Alexa site:

http://pages.alexa.com/company/partners.html

You will see that they have "partnership" with the Internet Archives - as well as with Google. There are several other examples this partnership on the Alexa site.


Regarding a partnership with Google, as Alexa is partnered with both Archives.org and Google, it is logical to suggest that there is, if even in a roundabout manner, a link between Archives.org and Google. In support of this, please look at "The Council of Library and Information Resources" site at:

http://www.clir.org/pubs/reports/pub106/web.html

"Alternatively, consider the model of Alexa Internet and the Internet Archive. Alexa Internet is a for-profit corporation that measures the quality of Web pages by tracing consumers' use of the Web. These measurements are made using an enormous Web archive, built by Alexa Internet using Web "spiders" (robots or agents) that roam the Web copying everything they find, unless forbidden entry. In this model, commercial use provides a viable economic base for the creation of the Web archive; note that Yahoo!, Google, and other search engine companies have also built large Web archives for commercial purposes. Alexa Internet then turns over the Web archive to the nonprofit Internet Archive, which provides for long-term preservation of the digital archive."

When one enters the site, www.edethics.org into an Alexa search, you will see that the information listed there is very incomplete - almost as if the site didn't exist for most of the time it has been on the 'net. Interestingly, Alexa, unlike Google has done a rather large search of the site - infact, they download nearly 20 mgs of material about a month ago! The fact that the limited information is so incomplete also indicates some kind of problem with Alexa.






While I appreciate your response, and it is rather altruistic, it is far better for people to publish their family history instead (I have). Also, in the case of this site, we document very real evils in our public schools and the evil people (I mean that in the most literal sense of the term) who harm children and their teachers. While we do not post things that are not true, we do document the personal horrors inflicted upon wonderful teachers and their students - the site is filled with real stories of nervous breakdowns, numerous case of post-traumatic stress disorder, bankruptcies, and other horrible things caused by sick school administrators and corrupt teachers' unions. Sometimes, the teachers who have posted their stories there ask for them to be edited or removed - we honor all such requests - even though these stories are completely true. These teachers certainly don't want to revist their horrors on archive.org when they have requested us to delete their story.

Any very important issue is the fact that the evil school districts and corrupt unions that hate us (we're glad) because we tell the truth have actually used (in fact printed out every page on the site) as "evidence" to continue their vile attacks on inicent people. The last time that happened, a sleezy school district and their shysters (again, that's a literal term) tried to retaliate against the teacher by trying to take away his credential to teach - the teacher had previous (and successfully) brought down that evil place down when he successfully prevailed in a US Dept. of Education, Office for Civil Rights case against the district for their intentional racist practices and harm to children with severe needs. Even though those phony charges were dropped and the school district was OCR determined they retaliated against him. The last thing we want is for deep pocket district's and their shysters is copying complete sets of our site, and archives of that site to further harm innocent people.

As there is a connection between Archives.org and Alexa (who is also partnered with Google) I remain completely preplexed as to the sudden (and significant) drop of the site and plunge in the number of visitors.

Thanks.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 8, 2003 1:45pm
Forum: web Subject: Re: Removal of Site- not from google

The partnership with Alexa is exactly as you quoted: Alexa donates crawls of the Web for the Wayback Machine. As far as I know, IA has no ties to Google, despite Google's amazing service.

I appreciate your concerns about your not wanting people to be able to easily rip you website's contents from an Archive such as this. I won't try to argue against it, and again that's why you are given the choice to have your contents excluded.

There is no conspiracy here - your voluntary exclusion from the Internet Archive would have no direct effect on your rankings in Google or Alexa. It might, as Diana suggested, be a side effect, but to me even that seems unlikely.

Do you use a robots.txt file to prevent crawler's like Alexa's from crawling your site?

This post was modified by Jonathan Aizen on 2003-05-08 20:45:37

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 9, 2003 4:15pm
Forum: web Subject: Re: Removal of Site- not from google

Hi Jonathan,

Thanks for your response. I'm glad that nothing at the Wayback Machine is intentionally blocking our site. The problem, however, remains and I'm at a complete loss as to where to turn next.

A robot (the one suggested on the Wayback Machine site) was used for a short period of time - right after I discovered that it was being archived. This was done because I couldn't get the site removed for several weeks. I believe that the Wayback Machine site says that the use of the robot would only impact archiving of the site and not affect other crawls). It was after that, however, that the hits to the site started to plunge dramtically (we've reached as low as 25% of our usual daily total lately). The robot was removed as soon as the site was removed from the archives - does Alexa recognize the same robot.txt? If that's the case, it might explain all of the strange things that have occured over these past few months.

Thanks

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 10, 2003 4:19am
Forum: web Subject: Re: Removal of Site- not from google

So let's see if we can figure out what is going on. You should know that having a robots.txt file is going to block your site from being indexed by Wayback, Alexa, Google, and all other, respectful, search engines and crawlers. I'm a bit confused about your robots.txt situation - you say you had one but then deleted it? If you had one or still do, that is a good candidate for why your site has had its recent Google issues.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 10, 2003 5:45am
Forum: web Subject: Re: Removal of Site- not from google



Interesting but that's not what is stated on the instructions for removing sites from the Wayback Machine. It staes at

http://www.archive.org/about/exclude.php

that

*********************

"The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.?

The robots.txt file will do two things:

It will remove all documents from your domain from the Wayback Machine.
It will tell us not to crawl your site in the future.

To exclude the Internet Archive's crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

********************

While the robot.txt was removed from the site and I used the instructions found here to construct the robot.txt file found on this site, apparently, the instructions here are not correct as it will also prevent crawls from Alexa, Google (and, it also appears, from Yahoo!). Is that right?

Interestingly, the site still maintains a top listing at:
altavista.com
askjeeves.com
dogpile.com
mamma.com
msn search
teoma.com

As our site's visitors have historically come from searches on Google (and, to a lesser extent, Yahoo!), if the robot.txt file listed above blocks sites from appearing their crawls that would explain the huge plunge in visits to our site since the robot.txt file was placed on the site. As I mentioned previously, that file was removed awhile ago.

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: May 13, 2003 2:51am
Forum: web Subject: Re: Removal of Site- not from google

Your robots.txt file is blocking ONLY Alexa's crawler(user-agent:ia_archiver)

If I understand correctly that is what you want. You do not want your site to be archived. Right?

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 8:47am
Forum: web Subject: Re: Removal of Site- not from google

Actually, the robot.txt file hasn't been on my site for a long time (well over a month now). I don't want the site archived however, I do want people to find it and therefore, it should be on Alexa. Another VERY strange thing is that none of my sites (I currently have 4 up) appear on Alexa - even though I only put the robot.txt file on one of the sites.

Reply to this post
Reply [edit]

Poster: Igor Ranitovic Date: May 13, 2003 8:55am
Forum: web Subject: Re: Removal of Site- not from google

Check http://www.edethics.org/robots.txt

This file exists and it is blocking Alexa's crawler. Please check with your webmaster.

On the other hand, I am confused about what you really want. So I believe that the best thing for you is to give us a call. This way we can clear things out in the most productive way.

Take care,
Igor.


Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 9:20am
Forum: web Subject: Re: Removal of Site- not from google

UGH!!!!!! I don't know why the robots txt is there - I'm the webmaster and had removed the thing!!! I'll go remove it again!!!! That would explain a LOT.

Anyway, I don't want the site to be archived - but I DO want it picked up by Alexa, Google, etc. Hits to the site have PLUNGED since all of this started.

Also, I host 4 different sites on the account that this site is located on - none of them appear, however, on Alexa or in Google searches - would the robot.txt file prevent that from happening too?

Thanks for your help.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 11:54am
Forum: web Subject: Re: Removal of Site- not from google

That's it! Just removed the robot.txt file from the site (I had removed it once before). Suddenly, our site is back at #1 on Google. Wow - the robot text file should NOT have caused a problem with Google but now I'm CERTAIN that it did. Anyway, problem solved!

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 13, 2003 5:24pm
Forum: web Subject: Re: Removal of Site- not from google

WHAT is GOING ON!!!!! After several months of falling off of Google's charts, our site suddenly appeared in the NUMBER ONE position this afternoon - I know because I personally saw it there!!!! Just check again, and it's disappeared again (actually, on page 11) AND typing link:www.edethics.org displays NOTHING - again?!?!?!!?!? I do know that it appeared RIGHT after the robot.txt file was removed (again). This is VERY strange. Any ideas????

Reply to this post
Reply [edit]

Poster: PcDevils Date: Nov 26, 2003 9:44am
Forum: web Subject: Re: Removal of Site- not from google

did you ever consider it may because your site sucks?

The less links and relevent searches being made are morelikley to be the cause of the drop in google. That and people must not be searching for that exact topic asmuch Get over it and stop blaming the archive for a coincidence.

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: Nov 26, 2003 10:37am
Forum: web Subject: Re: Removal of Site- not from google

"did you ever consider it may because your site sucks?"

Nope. Only your own comments do.

"The less links and relevent searches being made are morelikley to be the cause of the drop in google. That and people must not be searching for that exact topic asmuch Get over it and stop blaming the archive for a coincidence."

Your ignorance is astounding. First of all, you're replying to a post that was made a LONG time ago. Second, the matter WAS related to the archive. Third, as the problem was resolved, MONTHS ago, the site is back at the top.

Grow up.

This post was modified by Concerned with Copyright on 2003-11-26 18:37:25

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffJonathan Aizen Date: May 10, 2003 7:23am
Forum: web Subject: Re: Removal of Site- not from google

Aha,

If you put: User-agent: ia_archiver in your robots.txt file then Google should still be crawling your site (and is, as shown by Diana). I don't know about Alexa - it's possible that Alexa's crawler has user agent ia_archiver, but that seems unlikely.

Anyway, all I can tell you for sure is that your site is not being intentionally punished. If you want to know why your site isn't coming up as it used to Google, I suggest you contact them. Best of luck and I'd be interested to hear what they have to tell you.

This post was modified by Jonathan Aizen on 2003-05-10 14:23:47

Reply to this post
Reply [edit]

Poster: Concerned with Copyright Date: May 10, 2003 7:29am
Forum: web Subject: Re: Removal of Site- not from google

If I hear anything from Google, I'll also post here. The last I thng I heard from them was that they were "looking into the problem."

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffDiana Hamilton Date: May 8, 2003 12:59pm
Forum: web Subject: Re: Removal of Site- and Google effect

Below is a relevant explanation of how Google does their ranking. It would seem that archive.org (or maybe Alexa, or both- I dunno how it works) would be weighty enough to seriously influence the ranks of the pages it touches. Since it touches scads of pages, in the broad context, the absence of its effect is noticeable, once it no longer touches something.

As a netizen I love both google and the archive so this is fascinating to me! I guess archive.org could be the Mother of All Google Bombs :)

-Easily Amused

--
http://www.google.com/technology/
PageRank Explained

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.

This post was modified by hamilton on 2003-05-08 19:59:54

Terms of Use (10 Mar 2001)